Practical Ambiguity Detection for Context-Free Grammars

نویسنده

  • H. J. S. Basten
چکیده

The use of unconstrained context-free grammars for generalized parsing techniques has several advantages over traditional grammar classes, but comes with the danger of undiscovered ambiguities. The ambiguity problem for these grammars is undecidable in the general case, but this does not have to be a problem in practice. Our goal is to find ambiguity detection techniques that have sufficient precision and performance to make them suitable for practical use on realistic grammars. We give a short overview of related work, and propose new directions for improvement. 1 Problem Description and Motivation Generalized parsing techniques allow the use of the entire class of context-free grammars (CFGs) for the specification of the syntax of programming languages. This has several advantages. First, it allows for modular syntax definitions, which simplifies grammar development and enables reuse. Second, it grants total freedom in structuring a grammar to best fit its intended use. Grammars do not have to be squeezed into LL, LALR or LR(k) form for instance. Unfortunately, using unconstrained context-free grammars comes with the danger of ambiguities. A grammar is ambiguous if one or more sentences in its language have multiple parse trees. The semantics of a sentence is usually based upon the structure of its parse tree, so an ambiguous sentence can have multiple meanings. This often indicates a grammar bug which should be avoided. However, in some cases a grammar is intended to contain some degree of ambiguity. For instance in reverse engineering, where certain legacy languages can only be disambiguated with type checking after parsing. In both cases it is important to know the sources of ambiguity in the developed grammar, so they can be resolved or verified. Unfortunately, detecting the (un)ambiguity of a grammar is undecidable in the general case [7, 10, 9]. However, this does not necessarily have to be a problem in practice. Several Ambiguity Detection Methods (ADMs) exist that approach the problem from different angles, all with their own strengths and weaknesses. Because of the undecidability of the problem there is a general tradeoff between precision and performance/termination. The challenge for all ADMs is to give the most precise and understandable answer in the time available. The current state of the art is not yet sufficiently advanced to be practical on realistic grammars, especially the larger ones. 2 Brief Overview of Related Work Existing ADMs can roughly be divided into two categories: exhaustive methods and approximative ones. Methods in the first category exhaustively search the language of a grammar for ambiguous sentences. This so called sentence generation is applied by [11, 8, 13, 1]. These methods are 100% accurate, but a problem is that they never terminate if the grammar’s language is of infinite size, which usually is the case. They do produce the most precise and useful ambiguity reports, namely ambiguous sentences and their parse trees. Approximative methods sacrifice accuracy to be able to always finish in finite time. They search an approximation of the grammar for possible ambiguity. The methods described in [12, 6] both apply conservative approximation to never miss ambiguities. The downside of this is that when they do find ambiguities, it is hard to verify whether or not these are false positives. In [2] we compared the practical usability of several ADMs on a set of grammars for real world programming languages. It turned out that the exhaustive sentence generator AMBER [13] was the most practical due to its exact reports and reasonable performance. However, it was still unsatisfactory to find realistic ambiguities in longer sentences. The approximative Noncanonical Unambiguity test [12] had a reasonably high accuracy, but it is only able to assess the ambiguity of a grammar as a whole. Its reports might point out sources of individual ambiguities, but these can be hard to understand.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Usability of Ambiguity Detection Methods for Context-Free Grammars

One way of verifying a grammar is the detection of ambiguities. Ambiguities are not always unwanted, but they can only be controlled if their sources are known. Unfortunately, the ambiguity problem for context-free grammars is undecidable in the general case. Various ambiguity detection methods (ADMs) exist, but they can never be perfect. In this paper we explore three ADMs to test whether they...

متن کامل

Ambiguity Detection: Scaling towards Scannerless

Static ambiguity detection would be an important aspect of language workbenches for textual software languages. The challenge is that automatic ambiguity detection of context-free grammars is undecidable. Sophisticated approximations and optimizations do exist, but these do not scale to grammars for so-called “scannerless parsers”, as of yet. We extend previous work on ambiguity detection for c...

متن کامل

Ambiguity Detection: Scaling to Scannerless

Static ambiguity detection would be an important aspect of language workbenches for textual software languages. However, the challenge is that automatic ambiguity detection in context-free grammars is undecidable in general. Sophisticated approximations and optimizations do exist, but these do not scale to grammars for so-called “scannerless parsers”, as of yet. We extend previous work on ambig...

متن کامل

1 st Doctoral Symposium of the International Conference on Software Language

The use of unconstrained context-free grammars for generalized parsing techniques has several advantages over traditional grammar classes, but comes with the danger of undiscovered ambiguities. The ambiguity problem for these grammars is undecidable in the general case, but this does not have to be a problem in practice. Our goal is to find ambiguity detection techniques that have sufficient pr...

متن کامل

Detecting Ambiguity in Programming Language Grammars

Ambiguous Context Free Grammars (CFGs) are problematic for programming languages, as they allow inputs to be parsed in more than one way. In this paper, we introduce a simple non-deterministic search-based approach to ambiguity detection which non-exhaustively explores a grammar in breadth for ambiguity. We also introduce two new techniques for generating random grammars – Boltzmann sampling an...

متن کامل

Conservative Ambiguity Detection in Context-Free Grammars

The ability to detect ambiguities in context-free grammars is vital for their use in several fields, but the problem is undecidable in the general case. We present a safe, conservative approach, where approximations cannot result in overlooked ambiguous cases. We analyze the complexity of its use, and compare it with other ambiguity detection methods.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010